Skip to content

Feat/litellm passthrough #199#201

Closed
endre82 wants to merge 2 commits intorynfar:mainfrom
endre82:feat/litellm-passthrough
Closed

Feat/litellm passthrough #199#201
endre82 wants to merge 2 commits intorynfar:mainfrom
endre82:feat/litellm-passthrough

Conversation

@endre82
Copy link
Copy Markdown
Contributor

@endre82 endre82 commented Mar 31, 2026

#199

feat: add LiteLLM passthrough adapter with x-litellm-* header detection

  • Add tsx as dev dependency and update supervisor to prefer bun > tsx > npx
  • Detect LiteLLM by user-agent header (litellm/) in addition to x-litellm- headers
  • Force stream=false for all LiteLLM requests (healthchecks don't send x-litellm-* headers)
  • Increase MAX_CONCURRENT_SESSIONS default from 10 to 50
  • Increase rate-limit retry attempts (2→3) and base delay (1s→2s) with exponential backoff
  • Allow rate-limit retry even after partial content was yielded
  • Add DEBUG_PROXY=true flag for detailed error diagnosis
  • Add prefersStreaming() to AgentAdapter interface (unused but available)

endre82 added 2 commits March 31, 2026 10:41
… rate-limit retries

- Add tsx as dev dependency and update supervisor to prefer bun > tsx > npx
- Detect LiteLLM by user-agent header (litellm/*) in addition to x-litellm-* headers
- Force stream=false for all LiteLLM requests (healthchecks don't send x-litellm-* headers)
- Increase MAX_CONCURRENT_SESSIONS default from 10 to 50
- Increase rate-limit retry attempts (2→3) and base delay (1s→2s) with exponential backoff
- Allow rate-limit retry even after partial content was yielded
- Add DEBUG_PROXY=true flag for detailed error diagnosis
- Add prefersStreaming() to AgentAdapter interface (unused but available)
@rynfar
Copy link
Copy Markdown
Owner

rynfar commented Apr 1, 2026

Thanks for putting this together — the LiteLLM adapter concept is solid and we want to get it in. However the PR bundles several unrelated changes that need to be separated before we can merge anything.

What we're pulling out and merging separately:
The core LiteLLM adapter (adapters/passthrough.ts, detection in detect.ts, tests) is good and we'll get that in as a clean PR.

What we're not merging from this PR, and why:

MAX_CONCURRENT_SESSIONS 10→50 — You can already control this yourself via MERIDIAN_MAX_CONCURRENT env var, so no code change is needed. That said, if you're hitting concurrency limits with LiteLLM specifically, bumping this is a reasonable thing to try — just be aware higher values risk process instability since each SDK spawn is ~11MB. We've noted this in the LiteLLM docs we're adding in the clean PR.

Rate-limit retry after partial content — The original guard ("Never retry after response content was yielded — response is committed") was intentional. Removing it for rate-limit errors risks corrupting or duplicating an in-flight SSE stream. The client has already received partial events — resuming on the same connection isn't safe. This needs more careful thought as a standalone change.

tsx dev dependency — Adds 500+ lines to the lock file for a dev convenience script. The project already has bun run ./bin/cli.ts for development. Not needed.

DEBUG_PROXY env var — The project already has CLAUDE_PROXY_DEBUG routed through claudeLog. Adding a second debug mechanism that writes raw console.error directly is inconsistent, and one of the debug lines isn't guarded by the flag at all (it fires on every rate limit for every user). We'll incorporate the useful cache visibility into the existing debug mechanism instead.

prefersStreaming() on the adapter interface — Added but never called in server.ts. The stream override is done by duplicating the LiteLLM header detection inline in server.ts instead. We'll wire it up properly so the adapter controls its own streaming preference.

We'll post the clean PR shortly and reference this one.

rynfar added a commit that referenced this pull request Apr 1, 2026
Auto-detects LiteLLM requests via litellm/* User-Agent or x-litellm-*
headers and routes them to a dedicated passthrough adapter.

- adapters/passthrough.ts: LiteLLM adapter — usesPassthrough()=true,
  prefersStreaming()=false, x-litellm-session-id for session continuity,
  <env cwd=...> extraction, mcp__litellm__* tool naming
- adapters/detect.ts: isLiteLLMRequest() detection, passthrough adapter
  wired in as priority 3 (after Droid and Crush, before OpenCode fallback)
- adapter.ts: add optional prefersStreaming(body) to AgentAdapter interface
- server.ts: move detectAdapter before stream determination; use
  adapter.prefersStreaming?.(body) to allow adapters to override stream
  setting (replaces the previous inline LiteLLM header duplication)
- proxy-litellm-adapter.test.ts: 32 new tests covering adapter behaviour
  and detectAdapter routing
- adapter-detection.test.ts: fix header() mock to handle no-arg call
  (isLiteLLMRequest calls header() with no args to inspect all headers)
- README.md: LiteLLM setup section, tested agents table entry,
  passthrough.ts in architecture module map

Closes #199. Based on original work in PR #201 by @endre82.
rynfar added a commit that referenced this pull request Apr 1, 2026
Auto-detects LiteLLM requests via litellm/* User-Agent or x-litellm-*
headers and routes them to a dedicated passthrough adapter.

- adapters/passthrough.ts: LiteLLM adapter — usesPassthrough()=true,
  prefersStreaming()=false, x-litellm-session-id for session continuity,
  <env cwd=...> extraction, mcp__litellm__* tool naming
- adapters/detect.ts: isLiteLLMRequest() detection, passthrough adapter
  wired in as priority 3 (after Droid and Crush, before OpenCode fallback)
- adapter.ts: add optional prefersStreaming(body) to AgentAdapter interface
- server.ts: move detectAdapter before stream determination; use
  adapter.prefersStreaming?.(body) to allow adapters to override stream
  setting (replaces the previous inline LiteLLM header duplication)
- proxy-litellm-adapter.test.ts: 32 new tests covering adapter behaviour
  and detectAdapter routing
- adapter-detection.test.ts: fix header() mock to handle no-arg call
  (isLiteLLMRequest calls header() with no args to inspect all headers)
- README.md: LiteLLM setup section, tested agents table entry,
  passthrough.ts in architecture module map

Closes #199. Based on original work in PR #201 by @endre82.
@rynfar
Copy link
Copy Markdown
Owner

rynfar commented Apr 1, 2026

Closing this out now that the LiteLLM adapter has landed in #215 (shipped in v1.24.0). The core work here — adapter detection, passthrough mode, session continuity via x-litellm-session-id — is all in. Thanks again @endre82, the original PR was a solid foundation.

@rynfar rynfar closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants